You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Local Interpretation (LIME) (Operator Toolbox)
Synopsis
This operator is a meta operator to generate an approximation of the decision a given (complex) model made for specific examples. The key idea is to generate local feature weights (Interpretations) which can be easier interpreted and thus can help to understand the reasoning for a decision of a complex model on a per example basis.Description
To do this we run the following algorithm:
- 1. Draw uniformly distributed random data in [0,1] for all attributes
- 2. Scale them to the same min/max like the input data
- 3. Score them with the complex model
- For each example in your input set do: 4. Calculate the euclidean distance d between the example and your random examples (normalized) 5. Use w = sqrt(exp(-(pow(d,2)/pow(this.kernel_width,2))) as a weight for the local model 6. Run the inner process, which creates a feature weight (e.g. via Weight by XXX, Linear Regression or Logistic Regression) 7. Add the top k attributes with their importance to the input set
As a result you get example set with new attributes describing the most important local attributes and a collection of attribute weights containing the full vector. If you calculate a performance vector in the inner process and connect it to the Per port, you can get the performance of the local description attached to every example.
The algorithm is very similar to LIME. Details on Lime can be found here:- https://homes.cs.washington.edu/~marcotcr/blog/lime/
- https://arxiv.org/pdf/1602.04938v1.pdf
- https://github.com/marcotcr/lime
Input
- exa (Data Table)
The ExampleSet you want to get interpretations for. Needs to have a reasonable size to estimate min and max.
- mod (Model)
The (complex) input model.
Output
- exa (Data Table)
ExampleSet with local interpretations.
- mod (Model)
The passed through input model.
- wei
A collection of Attribute Weights for each example.
- loc
The collection of local models.
Parameters
- use_locality_heuristics If this parameter is set to true the locality heuristics derived from LIME (0.2*sqrt(#atts)) is used, otherwise the locality has to be set manually. Range:
- locality A factor describing how local the model should be. The smaller this value is the more localized the model. It is used as kernel_width in step 5. Range:
- sample_size Number of random examples drawn to built the local models on. Range:
- number_of_attributes Number of attributes put into the ExampleSet in step 7. All attribute weights are delivered via the 'wei' port. Range:
- weight_threshold A threshold to remove all examples with weights smaller than this value in each iteration. This removes irrelevant (non-local) random examples from the learning and can significantly speed up the operator. Range:
- use_local_random_seed This parameter indicates if a local random seed should be used. Range:
- local_random_seed If the use local random seed parameter is checked this parameter determines the local random seed. Range:
Tutorial Processes
Use Linear Regression to Interpret Deep Learing
Deep Learning Model on Iris interpreted by local Linear Regressions.
Use Weight by Gini Index to Interpret a GBT
Get Local Interpretations by using Weight By Gini Index for a GBT model trained on Iris.
Use a optimized Decision Tree to explain a GBT
In this tutorial we try to interpret GBT results on the iris data set. To do this we optimize the depth of a decision tree using a nested Optimize Parameters operator.